Accelerating Data Deduplication by Exploiting Pipelining and Parallelism with Multicore or Manycore Processors

نویسندگان

Wen Xia

Hong Jiang

Dan Feng

Lei Tian

چکیده

As the amount of the digital data grows explosively, Data deduplication has gained increasing attention for its space-efficient functionality that not only reduces the storage space requirement by eliminating duplicate data but also minimizes the transmission of redundant data in data-intensive storage systems. Most existing state-ofthe-art deduplication methods remove redundant data at either the file level or the chunk level (e.g. Fixed-Sized Chunking and Content-Defined Chunking). But the four stages of the traditional data deduplication process, chunking, fingerprinting, indexing, writing the metadata & unique data chunks, are time consuming in storage systems, especially the processes of chunking and fingerprinting take up significant CPU resources. Since the computing power of single-core stagnates while the throughput of storage devices continues to increase steadily (e.g. flash and PCM), the chunking and fingerprinting stages of deduplication are becoming much slower than the writing stage in real-world deduplication based storage systems. More specifically, the Rabin-based chunking algorithm and the SHA-1or MD5-based fingerprinting algorithms all need to compute the hash digest, which may lengthen the write process to an unacceptable level for the required write speed in high performance storage systems. Currently, there are two general approaches to accelerating the time-consuming hash calculation and alleviating the computing bottleneck of data deduplication, namely, software-based and hardware-based methods. The former refers to employing a dedicated co-processor to minimize the time overheads of computing the hash function so that the deduplication-induced storage performance degradation becomes negligible or acceptable [1, 2, 4]. A good example of the hardware-based methods is called StoreGPU [5] that makes full use of the computing power of the GPU device to meet the computational demand of the hash calculation in storage systems. The software-based approaches exploit the parallelism of data deduplication instead of employing faster computing devices. Liu et al. [5] and Guo et al. [3] have attempted to improve the write performance by pipelining the Fixed-Sized Chunking (FSC) process. Because of the internal content dependency, it remains a challenge to fully exploit the parallelism in the chunking and fingerprinting tasks of the Content-Defined Chunking (CDC) based deduplication approaches. In this report, we propose P-Dedupe, a deduplication system for high performance data storage that pipelines and parallelizes the compute-intensive deduplication processes to remove the write bottleneck. PDedupe exploits the pipelining among the deduplication data units (e.g. chunks and files) and parallelism among the deduplication functional units (e.g. the fingerprinting and chunking tasks) by making full use of the idle computing resources in a multicoreor manycore-based computer system. P-Dedupe aims to remove the time overheads of hashing and shift the deduplication bottleneck from the CPU to the IO, so as to easily embed data deduplication into a normal data storage system with little or no impact on the write performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Synchronization and Pipelining on Multicore: Shaping Parallelism for a New Generation of Processors

The potential for higher performance from increasing on-chip transistor densities, on the one hand, and the limitations in instruction-level parallelism of sequential applications and in the scalability of increasingly complicated superscalar and multithreaded architectures, on the other, are leading the microprocessor industry to embrace chip multi-processors as a cost-effective solution for t...

متن کامل

Parallel Band Two-Sided Matrix Bidiagonalization for Multicore Architectures

The objective of this paper is to extend, in the context of multicore architectures, the concepts of algorithms-by-tiles [Buttari et al., 2007] for Cholesky, LU, QR factorizations to the family of twosided factorizations. In particular, the bidiagonal reduction of a general, dense matrix is very often used as a pre-processing step for calculating the singular value decomposition. Furthermore, i...

متن کامل

Performance of RDF Query Processing on the Intel SCC

Chip makers are envisioning hundreds of cores in future processors for throughput oriented computing. These processors, called manycore processors, require new architectural innovations for scaling to a large number of cores as compared with today’s multicore processors. We report an early study on the performance of RDF query processing on a manycore processor. In our study, we use the Intel S...

متن کامل

Towards RDF Query Processing on the Intel Single-Chip Cloud

متن کامل

University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory Synchronization for Dynamic Task Parallelism on Manycore Architectures

Manycore architectures –hundreds to thousands of cores per processor – are seen by many as a natural evolution of multicore processors. To take advantage of this massive parallelism in reality requires a productive programming interface for parallel programming, and an efficient execution and thread coordination runtime. Dynamic task parallelism, introduced recently in several programming langu...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Accelerating Data Deduplication by Exploiting Pipelining and Parallelism with Multicore or Manycore Processors

نویسندگان

چکیده

منابع مشابه

Synchronization and Pipelining on Multicore: Shaping Parallelism for a New Generation of Processors

Parallel Band Two-Sided Matrix Bidiagonalization for Multicore Architectures

Performance of RDF Query Processing on the Intel SCC

Towards RDF Query Processing on the Intel Single-Chip Cloud

University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory Synchronization for Dynamic Task Parallelism on Manycore Architectures

عنوان ژورنال:

اشتراک گذاری